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ABSTRACT 

Optimum sample size is an essential component of any research. The main purpose of the sample size calculation is to 
determine the number of samples needed to detect significant changes in clinical parameters, treatment effects or 
associations after data gathering. It is not uncommon for studies to be underpowered and thereby fail to detect the 
existing treatment effects due to inadequate sample size. In this paper, we explain briefly the basic principles of sample 
size calculations in medical studies. 
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Introduction 

Sample size calculations or sample size 
justifications is one of the first steps in designing a 
clinical study. The sample size is the number of 
patients or other investigated units that will be 
included in a study and required to answer the 
research hypothesis in the study. The main 
purpose of the sample size calculation is to 
determine the enough number of units needed to 
detect the unknown clinical parameters or the 
treatment effects or the association after data 
gathering. 

If the sample size is too small, the investigator 
may not be able to answer the study question. On 
the other hand, the number of patients in many 
studies is limited due to practicalities such as cost, 
patient inconvenience, decisions not to proceed 
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with an investigation or a prolonged study time. 
Investigators should calculate the optimum sample 
size before data gathering to avoid the mistakes 
because of too small sample size and also wasting 
money and time, because of too large sample size. 
Besides, sample size calculations for research 
projects are an essential part of a study protocol 
for submission to ethical committees or for some 
peer review journals (1). It is very important to 
determine the sample size according to the study 
design and the objectives of the study. Making 
mistakes in the calculation of the size of sample 
can lead to incorrect or insignificant results (2). In 
this paper, we explain briefly the basic principles 
of sample size calculations in medical studies. 

Assumptions for sample size 
calculation 

There are some assumptions in order to 
calculate the sample size including variability. 
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type I and type 11 errors and the smallest effect of 
interest. 

Outcome's variability 

The variability in the outcome variable is the 
population variance of a given outcome that is 
estimated by the standard deviation. Investigators 
can use an estimate obtained from a pilot study or 
the reported variation the previously studies. 

The type I and type II errors 

The type I error is the rejection of a true null 
hypothesis and type n error is the failure to reject 
a false null hypothesis. In other meaning, a type I 
error is corresponding to the level of confidence in 
sample size calculation, which is the degree of 
uncertainty or probability that a sample value lies 
outside a stated hmits (2) and type n error is in 
corresponding to power, which means the abihty 
of a statistical test to reject the false null 
hypothesis. Power analysis can be used to 
calculate the minimum sample size so that 
investigator can detect an effect of a given size. 

Effect size 

The effect size is the minimal difference 
between the studied groups that the investigator 
wishes to detect or the difference between 
estimation and unknown parameter which 
investigator wants to estimate. Therefore, one can 
makes a statement that it does not matter how 
much the sample estimation differs from true 
population value by a certain amount. This amount 
is called minimum effect size. 

Sample size calculation in cross-sectional studies 

In cross-sectional studies the aim is to estimate 
the prevalence of unknown parameter(s) from the 
target population using a random sample. So an 
adequate sample size is needed to estimate the 
population prevalence with a good precision. 

To calculate this adequate sample size there is 
a simple formula, however it needs some practical 
issues in selecting values for the assumptions 
required in the formula too and in some situations. 



the decision to select the appropriate values for 
these assumptions are not simple (3). The 
following simple formula would be used for 
calculating the adequate sample size in prevalence 

Z - F (i - F ) 
n = 

study (4); d Where n is the sample 

size, Z is the statistic corresponding to level of 
confidence, P is expected prevalence (that can be 
obtained from same studies or a pilot study 
conducted by the researchers), and d is precision 
(corresponding to effect size). 

The level of confidence usually aimed for is 
95%, most researchers present their results with a 
95% confidence interval (CI). However, some 
researchers wants to be more confident can chose 
a 99% confidence interval. 

Researcher needs to know the assumed P in 
order to use in formula. This can be estimated 
from previous studies published in the study 
domain or conduct a pilot study with small sample 
to estimate the assumed P value. This assumed P 
is a very important issue because the precision (d) 
should be selected according to the amount of P. 
There is not enough guideline for choosing 
appropriate d. Some authors recommended to 
select a precision of 5% if the prevalence of the 
disease is going to be between 10% and 90%, 
However, when the assumed prevalence is too 
small (going to be below 10%), the precision of 
5% seems to be inappropriate. For example, if the 
assumed prevalence is 1% the precision of 5% is 
obviously crude and it may cause inappropriate 
sample size (3). A conservative choice would be 
one-fourth or one-fifth of prevalence as the 
amount of precision in the case of small P. In 
Table 1, we presented sample size calculation for 
three different P and three different precisions. For 
P=0.05, the appropriate precision is 0.01 which 
resulted to 1825 samples. For P=0.2, the best 
precision would be 0.04 and when P increases to 
0.6, the precision could increases up to 0.1 (or 
more), yields to 92 samples. The investigators 
should notice to the appropriate precision 
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according to assumed P. The wrong precision 
yields to wrong sample size (too small or too 
large). 

Table 1. Sample size to Estimate Prevalence with 
different Precision and 95% of confidence 
Precision Assumed Prevalence 

005 02 06 

0.01 1825 6147 9220 

0.04 114 384 576 

OlO 18 61 92 

Sample size calculation in case- 
control studies 

The case-control is a type of epidemiological 
observational study. It is often used to identify risk 
factors that may associated to a disease by 
comparing the risk factors in subjects who have 
that disease (the cases) with subjects who do not 
have the disease (the controls). 

The sample size calculation for unmatched case 
control studies (the number of cases and controls) 
needs these assumptions; the assumed number of 
cases and controls who experienced the risk 
factors from similar studies or from a pilot study 
(also researchers can use the assumed odds ratio; 
OR), the level of confidence (almost 95%) and the 
proposed power of the study (would be from 
80%). There are software or guide books that 
provide the investigators with the formula or the 
sample size calculated in tables according to 
different assumptions (5). But researchers should 
remember that, in the presence of a significant 
confounding factor (6), researchers require a larger 
sample size. Since the confounding variables must 
be controlled for in any analysis, a more complex 
statistical model must be made, so a larger sample 
is required to achieve significance. 

Sample size in clinical trials 

In a clinical trial, if the sample size is too 
small, a well conducted study may fail to answer 



its research hypothesis or may fail to detect 
important effects and associations (6). The 
minimum information needed to calculate 
sample size for a randomized controlled trial 
includes the power, the level of significance, the 
underlying event rate in the population and the 
size of the treatment effect sought. Besides this, 
the calculated sample size should be adjusted 
for other factors including expected compliance 
rates and, less commonly, an unequal allocation 
ratio (7). 

There are some recommendations for different 
phases of clinical trials based on their sample 
size; in phase I trial that involve drug safety on 
human volunteers. Initial trials might require a 
total of around 20-80 patients. In phase II trials 
that investigate the treatment effects, seldom 
require more than 100-200 patients (8). 

Conclusion 

Optimum sample size is an essential 
component of any research (9). It is not 
uncommon for studies to be underpowered and 
fail to detect treatment effects due to inadequate 
sample size (10). The calculation of adequate 
sample size is an important part of any clinical 
studies and a professional statistician is the best 
person to ask for help at the time of planning a 
research project (6). However, researchers must 
provide the necessary information in order that 
the sample size can be determined according to 
correct assumptions (1). There are many 
statistical books provided the methods for 
sample size calculation in medical studies (5) 
and also several software programs available to 
help with sample size calculations (11), or 
online software in the internet. While these 
programs are user-friendly, researchers should 
consult an experienced statistician at the design 
stages of their projects to avoid methodological 
errors. 
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